Unsupervised Learning: Trade&Ahead¶

Marks: 60

Context¶

The stock market has consistently proven to be a good place to invest in and save for the future. There are a lot of compelling reasons to invest in stocks. It can help in fighting inflation, create wealth, and also provides some tax benefits. Good steady returns on investments over a long period of time can also grow a lot more than seems possible. Also, thanks to the power of compound interest, the earlier one starts investing, the larger the corpus one can have for retirement. Overall, investing in stocks can help meet life's financial aspirations.

It is important to maintain a diversified portfolio when investing in stocks in order to maximise earnings under any market condition. Having a diversified portfolio tends to yield higher returns and face lower risk by tempering potential losses when the market is down. It is often easy to get lost in a sea of financial metrics to analyze while determining the worth of a stock, and doing the same for a multitude of stocks to identify the right picks for an individual can be a tedious task. By doing a cluster analysis, one can identify stocks that exhibit similar characteristics and ones which exhibit minimum correlation. This will help investors better analyze stocks across different market segments and help protect against risks that could make the portfolio vulnerable to losses.

Objective¶

Trade&Ahead is a financial consultancy firm who provide their customers with personalized investment strategies. They have hired you as a Data Scientist and provided you with data comprising stock price and some financial indicators for a few companies listed under the New York Stock Exchange. They have assigned you the tasks of analyzing the data, grouping the stocks based on the attributes provided, and sharing insights about the characteristics of each group.

Data Dictionary¶

  • Ticker Symbol: An abbreviation used to uniquely identify publicly traded shares of a particular stock on a particular stock market
  • Company: Name of the company
  • GICS Sector: The specific economic sector assigned to a company by the Global Industry Classification Standard (GICS) that best defines its business operations
  • GICS Sub Industry: The specific sub-industry group assigned to a company by the Global Industry Classification Standard (GICS) that best defines its business operations
  • Current Price: Current stock price in dollars
  • Price Change: Percentage change in the stock price in 13 weeks
  • Volatility: Standard deviation of the stock price over the past 13 weeks
  • ROE: A measure of financial performance calculated by dividing net income by shareholders' equity (shareholders' equity is equal to a company's assets minus its debt)
  • Cash Ratio: The ratio of a company's total reserves of cash and cash equivalents to its total current liabilities
  • Net Cash Flow: The difference between a company's cash inflows and outflows (in dollars)
  • Net Income: Revenues minus expenses, interest, and taxes (in dollars)
  • Earnings Per Share: Company's net profit divided by the number of common shares it has outstanding (in dollars)
  • Estimated Shares Outstanding: Company's stock currently held by all its shareholders
  • P/E Ratio: Ratio of the company's current stock price to the earnings per share
  • P/B Ratio: Ratio of the company's stock price per share by its book value per share (book value of a company is the net difference between that company's total assets and total liabilities)

Importing necessary libraries and data¶

In [1]:
# Libraries to help with reading and manipulating data
import numpy as np
import pandas as pd

# Libraries to help with data visualization
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_theme(style='darkgrid')

# Removes the limit for the number of displayed columns
pd.set_option("display.max_columns", None)
# Sets the limit for the number of displayed rows
pd.set_option("display.max_rows", 200)

# to scale the data using z-score
from sklearn.preprocessing import StandardScaler

# to compute distances
from scipy.spatial.distance import cdist, pdist

# to perform k-means clustering and compute silhouette scores
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

# to visualize the elbow curve and silhouette scores
from yellowbrick.cluster import KElbowVisualizer, SilhouetteVisualizer

# to perform hierarchical clustering, compute cophenetic correlation, and create dendrograms
from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import dendrogram, linkage, cophenet

# to suppress warnings
import warnings
warnings.filterwarnings("ignore")

Data Overview¶

  • Observations
  • Sanity checks
In [2]:
data = pd.read_csv('/content/stock_data.csv')
In [3]:
# to check the shape of the data
data.shape
Out[3]:
(340, 15)
  • There are 340 rows, and 15 columns.
In [4]:
# to view a sample of the data
data.sample(n=10, random_state=1)
Out[4]:
Ticker Symbol Security GICS Sector GICS Sub Industry Current Price Price Change Volatility ROE Cash Ratio Net Cash Flow Net Income Earnings Per Share Estimated Shares Outstanding P/E Ratio P/B Ratio
102 DVN Devon Energy Corp. Energy Oil & Gas Exploration & Production 32.000000 -15.478079 2.923698 205 70 830000000 -14454000000 -35.55 4.065823e+08 93.089287 1.785616
125 FB Facebook Information Technology Internet Software & Services 104.660004 16.224320 1.320606 8 958 592000000 3669000000 1.31 2.800763e+09 79.893133 5.884467
11 AIV Apartment Investment & Mgmt Real Estate REITs 40.029999 7.578608 1.163334 15 47 21818000 248710000 1.52 1.636250e+08 26.335526 -1.269332
248 PG Procter & Gamble Consumer Staples Personal Products 79.410004 10.660538 0.806056 17 129 160383000 636056000 3.28 4.913916e+08 24.070121 -2.256747
238 OXY Occidental Petroleum Energy Oil & Gas Exploration & Production 67.610001 0.865287 1.589520 32 64 -588000000 -7829000000 -10.23 7.652981e+08 93.089287 3.345102
336 YUM Yum! Brands Inc Consumer Discretionary Restaurants 52.516175 -8.698917 1.478877 142 27 159000000 1293000000 2.97 4.353535e+08 17.682214 -3.838260
112 EQT EQT Corporation Energy Oil & Gas Exploration & Production 52.130001 -21.253771 2.364883 2 201 523803000 85171000 0.56 1.520911e+08 93.089287 9.567952
147 HAL Halliburton Co. Energy Oil & Gas Equipment & Services 34.040001 -5.101751 1.966062 4 189 7786000000 -671000000 -0.79 8.493671e+08 93.089287 17.345857
89 DFS Discover Financial Services Financials Consumer Finance 53.619999 3.653584 1.159897 20 99 2288000000 2297000000 5.14 4.468872e+08 10.431906 -0.375934
173 IVZ Invesco Ltd. Financials Asset Management & Custody Banks 33.480000 7.067477 1.580839 12 67 412000000 968100000 2.26 4.283628e+08 14.814159 4.218620

Columns:¶

  • Ticker Symbol: An abbreviation used to uniquely identify publicly traded shares of a particular stock on a particular stock market
  • Security/Company Name: Name of the company
  • GICS Sector: The specific economic sector assigned to a company by the Global Industry Classification Standard (GICS) that best defines its business operations
  • GICS Sub Industry: The specific sub-industry group assigned to a company by the Global Industry Classification Standard (GICS) that best defines its business operations
  • Current Price: Current stock price in dollars
  • Price Change: Percentage change in the stock price in 13 weeks
  • Volatility: Standard deviation of the stock price over the past 13 weeks
  • ROE: A measure of financial performance calculated by dividing net income by shareholders' equity (shareholders' equity is equal to a company's assets minus its debt)
  • Cash Ratio: The ratio of a company's total reserves of cash and cash equivalents to its total current liabilities
  • Net Cash Flow: The difference between a company's cash inflows and outflows (in dollars)
  • Net Income: Revenues minus expenses, interest, and taxes (in dollars)
  • Earnings Per Share: Company's net profit divided by the number of common shares it has outstanding (in dollars)
  • Estimated Shares Outstanding: Company's stock currently held by all its shareholders
  • P/E Ratio: Ratio of the company's current stock price to the earnings per share
  • P/B Ratio: Ratio of the company's stock price per share by its book value per share (book value of a company is the net difference between that company's total assets and total liabilities)
In [5]:
# to check column data types
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 340 entries, 0 to 339
Data columns (total 15 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Ticker Symbol                 340 non-null    object 
 1   Security                      340 non-null    object 
 2   GICS Sector                   340 non-null    object 
 3   GICS Sub Industry             340 non-null    object 
 4   Current Price                 340 non-null    float64
 5   Price Change                  340 non-null    float64
 6   Volatility                    340 non-null    float64
 7   ROE                           340 non-null    int64  
 8   Cash Ratio                    340 non-null    int64  
 9   Net Cash Flow                 340 non-null    int64  
 10  Net Income                    340 non-null    int64  
 11  Earnings Per Share            340 non-null    float64
 12  Estimated Shares Outstanding  340 non-null    float64
 13  P/E Ratio                     340 non-null    float64
 14  P/B Ratio                     340 non-null    float64
dtypes: float64(7), int64(4), object(4)
memory usage: 40.0+ KB

Data Types:

  • Object(4)- Ticker Symbol, Security, GICS Sector, GICS Sub Industry
  • Float(7)- Current Price, Price Chanage, Volatility, Earnings Per Share, Estimated Shares Outstanding, P/E Ratio, P/B Ratio
  • Int64 (4)- ROE, Cash Ratio, Net Cash Flow, Net Income

Memory Usuage:

  • 40.0+ KB
In [6]:
# to copy data so original data is unchanged
stocks = data.copy()
In [7]:
# to check for duplicated data
stocks.duplicated().sum()
Out[7]:
0
  • There are no duplicates.
In [8]:
# to check for missing values in the data
stocks.isnull().sum()
Out[8]:
Ticker Symbol                   0
Security                        0
GICS Sector                     0
GICS Sub Industry               0
Current Price                   0
Price Change                    0
Volatility                      0
ROE                             0
Cash Ratio                      0
Net Cash Flow                   0
Net Income                      0
Earnings Per Share              0
Estimated Shares Outstanding    0
P/E Ratio                       0
P/B Ratio                       0
dtype: int64
  • There are no missing values.
In [9]:
# to view the statistical summary of the data
stocks.describe(include='all').T
Out[9]:
count unique top freq mean std min 25% 50% 75% max
Ticker Symbol 340 340 AAL 1 NaN NaN NaN NaN NaN NaN NaN
Security 340 340 American Airlines Group 1 NaN NaN NaN NaN NaN NaN NaN
GICS Sector 340 11 Industrials 53 NaN NaN NaN NaN NaN NaN NaN
GICS Sub Industry 340 104 Oil & Gas Exploration & Production 16 NaN NaN NaN NaN NaN NaN NaN
Current Price 340.0 NaN NaN NaN 80.862345 98.055086 4.5 38.555 59.705 92.880001 1274.949951
Price Change 340.0 NaN NaN NaN 4.078194 12.006338 -47.129693 -0.939484 4.819505 10.695493 55.051683
Volatility 340.0 NaN NaN NaN 1.525976 0.591798 0.733163 1.134878 1.385593 1.695549 4.580042
ROE 340.0 NaN NaN NaN 39.597059 96.547538 1.0 9.75 15.0 27.0 917.0
Cash Ratio 340.0 NaN NaN NaN 70.023529 90.421331 0.0 18.0 47.0 99.0 958.0
Net Cash Flow 340.0 NaN NaN NaN 55537620.588235 1946365312.175789 -11208000000.0 -193906500.0 2098000.0 169810750.0 20764000000.0
Net Income 340.0 NaN NaN NaN 1494384602.941176 3940150279.327936 -23528000000.0 352301250.0 707336000.0 1899000000.0 24442000000.0
Earnings Per Share 340.0 NaN NaN NaN 2.776662 6.587779 -61.2 1.5575 2.895 4.62 50.09
Estimated Shares Outstanding 340.0 NaN NaN NaN 577028337.75403 845849595.417695 27672156.86 158848216.1 309675137.8 573117457.325 6159292035.0
P/E Ratio 340.0 NaN NaN NaN 32.612563 44.348731 2.935451 15.044653 20.819876 31.764755 528.039074
P/B Ratio 340.0 NaN NaN NaN -1.718249 13.966912 -76.119077 -4.352056 -1.06717 3.917066 129.064585
  • Each column has a count of 340 which seems to inidcate no missing data.
  • NaN appears in numerous columns and rows so it may be necessary for further exploration/processing.
  • Non-numerical Columns: Ticker Symbol, Security, GICS Sector, GICS Sub Industry

  • Current Price:

    -Mean: 80.862345

    -Standard Deviation: 98.055086

    -Minimum: 4.5

    -Median: 59.705

    -Maximum: 1,274.949951

  • Price Change:

    -Mean: 4.078194

    -Standard Deviation: 12.006338

    -Minimum: -47.129693

    -Median: 4.819505

    -Maximum: 55.051683

  • Volatility:

    -Mean: 1.525976

    -Standard Deviation: 0.591798

    -Minimum: 0.733163

    -Median: 1.385593

    -Maximum: 4.580042

  • ROE:

    -Mean: 39.597059

    -Standard Deviation: 96.547538

    -Minimum: 1.0

    -Median: 15.0

    -Maximum: 917.0

  • Cash Ratio:

    -Mean: 70.023529

    -Standard Deviation: 90.421331

    -Minimum: 0.0

    -Median: 47.0

    -Maximum: 958.0

  • Net Cash Flow:

    -Mean: 55,537,620.588235

    -Standard Deviation: 1,946,365,312.175789

    -Minimum: -11,208,000,000.0

    -Median: 2,098,000.0

    -Maximum: 20,764,000,000.0

  • Net Income:

    -Mean: 1,494,384,602.941176

    -Standard Deviation: 3,940,150,279.327936

    -Minimum: -23,528,000,000.0

    -Median: 707,336,000.0

    -Maximum: 24,442,000,000.0

  • Earnings Per Share:

    -Mean: 2.776662

    -Standard Deviation: 6.587779

    -Minimum: -61.2

    -Median: 2.895

    -Maximum: 50.09

  • Estimated Shares Outstanding:

    -Mean: 577,028,337.75403

    -Standard Deviation: 845,849,595.417695

    -Minimum: 27,672,156.86

    -Median: 309,675,137.8

    -Maximum: 6,159,292,035.0

  • P/E Ratio:

    -Mean: 32.612563

    -Standard Deviation: 44.348731

    -Minimum: 2.935451

    -Median: 20.819876

    -Maximum: 528.039074

  • P/B Ratio:

    -Mean: -1.718249

    -Standard Deviation: 13.966912

    -Minimum: -76.119077

    -Median: -1.06717

    -Maximum: 129.064585

Exploratory Data Analysis (EDA)¶

  • EDA is an important part of any project involving data.
  • It is important to investigate and understand the data better before building a model with it.
  • A few questions have been mentioned below which will help you approach the analysis in the right manner and generate insights from the data.
  • A thorough analysis of the data, in addition to the questions mentioned below, should be done.

Univariate analysis¶

In [10]:
# function to plot a boxplot and a histogram along the same scale.


def histogram_boxplot(df, feature, figsize=(12, 7), kde=False, bins=None):
    """
    Boxplot and histogram combined

    data: dataframe
    feature: dataframe column
    figsize: size of figure (default (12,7))
    kde: whether to the show density curve (default False)
    bins: number of bins for histogram (default None)
    """
    f2, (ax_box2, ax_hist2) = plt.subplots(
        nrows=2,  # Number of rows of the subplot grid= 2
        sharex=True,  # x-axis will be shared among all subplots
        gridspec_kw={"height_ratios": (0.25, 0.75)},
        figsize=figsize,
    )  # creating the 2 subplots
    sns.boxplot(
        data=df, x=feature, ax=ax_box2, showmeans=True, color="violet"
    )  # boxplot will be created and a star will indicate the mean value of the column
    sns.histplot(
        data=df, x=feature, kde=kde, ax=ax_hist2, bins=bins, palette="winter"
    ) if bins else sns.histplot(
        data=df, x=feature, kde=kde, ax=ax_hist2
    )  # For histogram
    ax_hist2.axvline(
        df[feature].mean(), color="green", linestyle="--"
    )  # Add mean to the histogram
    ax_hist2.axvline(
        df[feature].median(), color="black", linestyle="-"
    )  # Add median to the histogram
In [11]:
# to view a histogram boxplot of Current Price
histogram_boxplot(stocks, 'Current Price')
  • Current Price is right-skewed
In [12]:
# to view a histogram boxplot of Price Change
histogram_boxplot(stocks, 'Price Change')
  • Price Change has an almost normal distribution
In [13]:
# to view a histogram boxplot of Volatility
histogram_boxplot(stocks, 'Volatility')
  • Volatility is right-skewed
In [14]:
# to view a histogram boxplot of ROE
histogram_boxplot(stocks, 'ROE')
  • ROE is right-skewed
In [15]:
# to view a histogram boxplot of Cash Ratio
histogram_boxplot(stocks, 'Cash Ratio')
  • Cash Ratio is right-skewed
In [16]:
# to view a histogram boxplot of Net Cash Flow
histogram_boxplot(stocks, 'Net Cash Flow')
  • Net Cash Flow is relatively normally distributed.
In [17]:
# to view a histogram boxplot of Net Income
histogram_boxplot(stocks, 'Net Income')
  • Net Income is relatively normally distributed.
In [18]:
# to view a histogram boxplot of Earnings Per Share
histogram_boxplot(stocks, 'Earnings Per Share')
  • Earnings Per Share is slightly left-skewed.
In [19]:
# to view a histogram boxplot of Estimated Shares Outstanding
histogram_boxplot(stocks, 'Current Price')
  • Current Price is right-skewed
In [20]:
# to view a histogram boxplot of P/E Ratio
histogram_boxplot(stocks, 'P/E Ratio')
  • P/E Ratio is right-skewed.
In [21]:
# to view a histogram boxplot of P/B Ratio
histogram_boxplot(stocks, 'P/B Ratio')
  • P/B Ratio is relatively normally distributed.
In [22]:
# function to create labeled barplots


def labeled_barplot(df, feature, perc=False, n=None):
    """
    Barplot with percentage at the top

    data: dataframe
    feature: dataframe column
    perc: whether to display percentages instead of count (default is False)
    n: displays the top n category levels (default is None, i.e., display all levels)
    """

    total = len(df[feature])  # length of the column
    count = df[feature].nunique()
    if n is None:
        plt.figure(figsize=(count + 1, 5))
    else:
        plt.figure(figsize=(n + 1, 5))

    plt.xticks(rotation=90, fontsize=15)
    ax = sns.countplot(
        data=df,
        x=feature,
        palette="Paired",
        order=df[feature].value_counts().index[:n].sort_values(),
    )

    for p in ax.patches:
        if perc == True:
            label = "{:.1f}%".format(
                100 * p.get_height() / total
            )  # percentage of each class of the category
        else:
            label = p.get_height()  # count of each level of the category

        x = p.get_x() + p.get_width() / 2  # width of the plot
        y = p.get_height()  # height of the plot

        ax.annotate(
            label,
            (x, y),
            ha="center",
            va="center",
            size=12,
            xytext=(0, 5),
            textcoords="offset points",
        )  # annotate the percentage

    plt.show()  # show the plot
In [23]:
# to view labeled barplot for GICS Sector
labeled_barplot(stocks, 'GICS Sector', perc=True)
  • Percentage of data by GICS Sector:
    • Consumer Discretionary: 11.8%
    • Consumer Staples: 5.6%
    • Energy: 8.8%
    • Financials: 14.4%
    • Health Care: 11.8%
    • Industrials: 15.6%
    • Information Technology: 9.7%
    • Materials: 5.9%
    • Real Estate: 7.9%
    • Telecommunications Services: 1.5%
    • Utilities: 7.1%
In [24]:
# to view labeled barplot for GICS Sub Industry
labeled_barplot(stocks, 'GICS Sub Industry', perc=True)
  • Percentage of data by GICS Sub Industry above 1.0%
    • Aerospace & Defense: 1.2%
    • Airlines: 1.5%
    • Asset Management & Custody Banks: 1.2%
    • Banks: 2.9%
    • Biotechnology: 2.1%
    • Building Products: 1.2%
    • Consumer Finance: 1.5%
    • Diversified Chemicals: 1.5%
    • Diversified Financial Services: 2.1%
    • Electric Utilities: 3.5%
    • Health Care Equipment: 3.2%
    • Health Care Facilities: 1.5%
    • Hotels, Resorts, & Cruise Lines: 1.2%
    • Industrial Conglomerates: 4.1%
    • Industrial Machinery: 1.5%
    • Integrated Oil & Gas: 1.5%
    • Integrated Telecommunications Services: 1.2%
    • Internet & Direct Marketing Retail: 1.2%
    • Internet Software & Services: 3.5%
    • Managed Health Care: 1.5%
    • MultiUtilities: 3.2%
    • Oil & Gas Exploration & Production: 4.7%
    • Oil & Gas Refining & Marketing & Transportation: 1.8%
    • Packaged Foods & Meats: 1.8%
    • Pharmaceuticals: 1.8%
    • Property & Casualty Insurance: 2.4%
    • REITs: 4.1%
    • Railroads: 1.2%
    • Research & Consulting Services: 1.2%
    • Residential REITs: 1.2%
    • Retail REITs: 1.2%
    • Semiconductors: 1.8%
    • Soft Drinks: 1.2%
    • Specialty Chemicals: 1.2%

Bivariate Analysis¶

Questions:

  1. What does the distribution of stock prices look like?
In [25]:
# to create a lineplot of current price and security
plt.figure(figsize=(100,5))
sns.lineplot(data=stocks, x='Security', y='Current Price')
plt.xticks(rotation=90)
plt.show()
  1. The stocks of which economic sector have seen the maximum price increase on average?
In [26]:
# to create a barplot of GICS Sector and Price Change
plt.figure(figsize=(15,8))
sns.barplot(data=stocks, x='GICS Sector', y='Price Change', ci=False)
plt.xticks(rotation=90)
plt.show()
  • The Health Care sector has seen the highest price increase on average.
  1. How are the different variables correlated with each other?
In [27]:
# correlation check
plt.figure(figsize=(15, 7))
sns.heatmap(
    stocks.corr(), annot=True, vmin=-1, vmax=1, fmt=".2f", cmap="Spectral"
)
plt.show()

Correlation:¶

Net Income:

  • Net Income and Estimated Shares Outstanding: 0.59
  • Net Income and Earnings Per Share: 0.56
  • Net Income and P/E Ratio: -0.22
  • Net Income and ROE: -0.29
  • Net Income and Volatility: -0.38

Earnings Per Share:

  • Earnings Per Share and Current Price: 0.48
  • Earnings Per Share and P/E Ratio: -0.26
  • Earnings Per Share and Volatility: -0.38
  • Earnings Per Share and ROE: -0.41

Price Change

  • Price Change and Volatility: -0.41
  1. Cash ratio provides a measure of a company's ability to cover its short-term obligations using only cash and cash equivalents. How does the average cash ratio vary across economic sectors?
In [28]:
# to view Cash Ratios by GICS Sector as a barplot
plt.figure(figsize=(15,8))
sns.barplot(data=stocks, x='GICS Sector', y='Cash Ratio', ci=False)
plt.xticks(rotation=90)
plt.show()

Cash Ratio Rankings:

  • Highest - Information Technology
  • 2nd Highest - Telecommunications Services
  • 3rd Highest - Health Care
  • 4th - Financials
  • 5th - Consumer Staples
  • 6th - Real Estate
  • 7th - Energy
  • 8th - Consumer Discretionary
  • 9th - Materials
  • 10th - Industrials
  • 11th - Utilities
  1. P/E ratios can help determine the relative value of a company's shares as they signify the amount of money an investor is willing to invest in a single share of a company per dollar of its earnings. How does the P/E ratio vary, on average, across economic sectors?
In [29]:
# to view P/E Ratios by GICS Sector as a barplot
plt.figure(figsize=(15,8))
sns.barplot(data=stocks, x='GICS Sector', y='P/E Ratio', ci=False)
plt.xticks(rotation=90)
plt.show()

P/E Ratio Rankings (Highest to Lowest):

  • Energy
  • Information Technology
  • Real Estate
  • Health Care
  • Consumer Discretionary
  • Consumer Staples
  • Materials
  • Utilities
  • Industrials
  • Financials
  • Telecommunication Services

Data Preprocessing¶

  • Duplicate value check
  • Missing value treatment
  • Outlier check
  • Feature engineering (if needed)
  • Any other preprocessing steps (if needed)
In [30]:
# to check for duplicated data
stocks.duplicated().sum()
Out[30]:
0
  • There are no duplicate values.
In [31]:
# to check for missing values
stocks.isnull().sum()
Out[31]:
Ticker Symbol                   0
Security                        0
GICS Sector                     0
GICS Sub Industry               0
Current Price                   0
Price Change                    0
Volatility                      0
ROE                             0
Cash Ratio                      0
Net Cash Flow                   0
Net Income                      0
Earnings Per Share              0
Estimated Shares Outstanding    0
P/E Ratio                       0
P/B Ratio                       0
dtype: int64
  • There are no missing values.

Outlier Check¶

In [32]:
# to plot the boxplots of all numerical columns to check for outliers
plt.figure(figsize=(15, 12))

numeric_columns = stocks.select_dtypes(include=np.number).columns.tolist()

for i, variable in enumerate(numeric_columns):
    plt.subplot(3, 4, i + 1)
    plt.boxplot(stocks[variable], whis=1.5)
    plt.tight_layout()
    plt.title(variable)

plt.show()
  • No need to treat outliers as the data is relevant and accurate.

Scaling¶

In [33]:
# scaling the data before clustering
scaler = StandardScaler()
subset = stocks[numeric_columns].copy()
subset_scaled = scaler.fit_transform(subset)
In [34]:
# creating a dataframe of the scaled data
subset_scaled_df = pd.DataFrame(subset_scaled, columns=subset.columns)

K-means Clustering¶

Checking Elbow Plot¶

In [35]:
k_means_df = subset_scaled_df.copy()
In [36]:
clusters = range(1, 15)
meanDistortions = []

for k in clusters:
    model = KMeans(n_clusters=k, random_state=1)
    model.fit(subset_scaled_df)
    prediction = model.predict(k_means_df)
    distortion = (
        sum(np.min(cdist(k_means_df, model.cluster_centers_, "euclidean"), axis=1))
        / k_means_df.shape[0]
    )

    meanDistortions.append(distortion)

    print("Number of Clusters:", k, "\tAverage Distortion:", distortion)

plt.plot(clusters, meanDistortions, "bx-")
plt.xlabel("k")
plt.ylabel("Average Distortion")
plt.title("Selecting k with the Elbow Method", fontsize=20)
plt.show()
Number of Clusters: 1 	Average Distortion: 2.5425069919221697
Number of Clusters: 2 	Average Distortion: 2.382318498894466
Number of Clusters: 3 	Average Distortion: 2.2692367155390745
Number of Clusters: 4 	Average Distortion: 2.1745559827866363
Number of Clusters: 5 	Average Distortion: 2.128799332840716
Number of Clusters: 6 	Average Distortion: 2.080400099226289
Number of Clusters: 7 	Average Distortion: 2.0289794220177395
Number of Clusters: 8 	Average Distortion: 1.964144163389972
Number of Clusters: 9 	Average Distortion: 1.9221492045198068
Number of Clusters: 10 	Average Distortion: 1.8513913649973124
Number of Clusters: 11 	Average Distortion: 1.8024134734578485
Number of Clusters: 12 	Average Distortion: 1.7900931879652673
Number of Clusters: 13 	Average Distortion: 1.7417609203336912
Number of Clusters: 14 	Average Distortion: 1.673559857259703
  • Number of Clusters:

    • 6 Average Distortion: 2.080400099226289
    • 7 Average Distortion: 2.0289794220177395
    • 8 Average Distortion: 1.964144163389972
    • 9 Average Distortion: 1.9221492045198068
    • 10 Average Distortion: 1.8513913649973124
    • 11 Average Distortion: 1.8024134734578485
    • 12 Average Distortion: 1.7900931879652673
    • 13 Average Distortion: 1.7417609203336912
    • 14 Average Distortion: 1.673559857259703
  • Elbow may be optimal somewhere between 6 and 11 clusters.

In [37]:
model = KMeans(random_state=1)
visualizer = KElbowVisualizer(model, k=(1, 15), timings=True)
visualizer.fit(k_means_df)  # fit the data to the visualizer
visualizer.show()  # finalize and render figure
Out[37]:
<Axes: title={'center': 'Distortion Score Elbow for KMeans Clustering'}, xlabel='k', ylabel='distortion score'>
  • Elbow looks optimal at 6 clusters.

Silhouette Scores¶

In [38]:
sil_score = []
cluster_list = range(2, 15)
for n_clusters in cluster_list:
    clusterer = KMeans(n_clusters=n_clusters, random_state=1)
    preds = clusterer.fit_predict((subset_scaled_df))
    score = silhouette_score(k_means_df, preds)
    sil_score.append(score)
    print("For n_clusters = {}, the silhouette score is {})".format(n_clusters, score))

plt.plot(cluster_list, sil_score)
plt.show()
For n_clusters = 2, the silhouette score is 0.43969639509980457)
For n_clusters = 3, the silhouette score is 0.4644405674779404)
For n_clusters = 4, the silhouette score is 0.4577225970476733)
For n_clusters = 5, the silhouette score is 0.43228336443659804)
For n_clusters = 6, the silhouette score is 0.4005422737213617)
For n_clusters = 7, the silhouette score is 0.3976335364987305)
For n_clusters = 8, the silhouette score is 0.40278401969450467)
For n_clusters = 9, the silhouette score is 0.3778585981433699)
For n_clusters = 10, the silhouette score is 0.13458938329968687)
For n_clusters = 11, the silhouette score is 0.1421832155528444)
For n_clusters = 12, the silhouette score is 0.2044669621527429)
For n_clusters = 13, the silhouette score is 0.23424874810104204)
For n_clusters = 14, the silhouette score is 0.12102526472829901)
  • Silhouette Scores (n_clusters):

    • 6 - the silhouette score is 0.4005422737213617
    • 7 - the silhouette score is 0.3976335364987305
    • 8 - the silhouette score is 0.40278401969450467
    • 9 - the silhouette score is 0.3778585981433699
    • 10 - the silhouette score is 0.13458938329968687
    • 11 - the silhouette score is 0.1421832155528444
  • Silhouette Scores divest by more than ~.02 from 8 to 9 clusters and 9 to 10 clusters.

In [39]:
model = KMeans(random_state=1)
visualizer = KElbowVisualizer(model, k=(2, 15), metric="silhouette", timings=True)
visualizer.fit(k_means_df)  # fit the data to the visualizer
visualizer.show()  # finalize and render figure
Out[39]:
<Axes: title={'center': 'Silhouette Score Elbow for KMeans Clustering'}, xlabel='k', ylabel='silhouette score'>
  • Silhouette Score Elbow for KMeans Clustering: 2 clusters
In [40]:
# finding optimal no. of clusters with silhouette coefficients
visualizer = SilhouetteVisualizer(KMeans(10, random_state=1))
visualizer.fit(k_means_df)
visualizer.show()
Out[40]:
<Axes: title={'center': 'Silhouette Plot of KMeans Clustering for 340 Samples in 10 Centers'}, xlabel='silhouette coefficient values', ylabel='cluster label'>
  • The clusters seem to be the least dense at 10 clusters over the 340 samples.

Creating Final Model¶

In [41]:
# final K-means model
kmeans = KMeans(n_clusters= 10, random_state=1)
kmeans.fit(k_means_df)
Out[41]:
KMeans(n_clusters=10, random_state=1)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
KMeans(n_clusters=10, random_state=1)
In [42]:
# creating a copy of the original data
stocks1 = stocks.copy()

# adding kmeans cluster labels to the original and scaled dataframes
k_means_df['KM_segments'] = kmeans.labels_
stocks1['KM_segments'] = kmeans.labels_

Cluster Profiling¶

In [43]:
km_cluster_profile = stocks1.groupby('KM_segments').mean()
In [44]:
km_cluster_profile['count_in_each_segment'] = (
    stocks1.groupby('KM_segments')["Security"].count().values
)
In [45]:
km_cluster_profile.style.highlight_max(color="lightblue", axis=0)
Out[45]:
  Current Price Price Change Volatility ROE Cash Ratio Net Cash Flow Net Income Earnings Per Share Estimated Shares Outstanding P/E Ratio P/B Ratio count_in_each_segment
KM_segments                        
0 62.644030 12.720586 1.529654 29.223404 61.319149 -156258638.297872 1919175936.170213 3.399149 636937648.906064 23.345566 0.739498 94
1 76.374133 0.834108 1.297704 23.023121 47.121387 115498173.410405 1390461699.421965 3.851069 337271215.505665 23.384698 -5.428802 173
2 46.672222 5.166566 1.079367 25.000000 58.333333 -3040666666.666667 14848444444.444445 3.435556 4564959946.222222 15.596051 -6.354193 9
3 327.006671 21.917380 2.029752 4.000000 106.000000 698240666.666667 287547000.000000 0.750000 366763235.300000 400.989188 -5.322376 3
4 108.304002 10.737770 1.165694 566.200000 26.600000 -278760000.000000 687180000.000000 1.548000 349607057.720000 34.898915 -16.851358 5
5 25.640000 11.237908 1.322355 12.500000 130.500000 16755500000.000000 13654000000.000000 3.295000 2791829362.100000 13.649696 1.508484 2
6 75.775186 14.419381 1.854929 29.111111 338.555556 696745611.111111 935969944.444444 2.005000 792523728.361111 44.919121 8.778016 18
7 508.534992 5.732177 1.504640 27.250000 150.875000 37895875.000000 1116994125.000000 15.965000 75654420.935000 43.727459 29.581664 8
8 24.485001 -13.351992 3.482611 802.000000 51.000000 -1292500000.000000 -19106500000.000000 -41.815000 519573983.250000 60.748608 1.565141 2
9 35.263847 -16.175693 2.841300 49.769231 48.153846 -135215038.461538 -2525946153.846154 -6.514231 482428533.751538 77.817252 1.618150 26

Maximum Value of Column at a Particular KM_segment:

KM_segments

0: None

1: Count_in_each_segment (173)

2: Net Income (14,848,444,444.444445) and Estimated Shares Outstanding (4,564,959,946.222222)

3: Price Change (21.917380) and P/E Ratio (400.989188)

4: None

5: Net Cash Flow (16,755,500,000.000000 )

6: Cash Ratio (338.555556)

7: Current Price (508.534992), Earnings Per Share (15.965000), and P/B Ratio (29.581664)

8: Volatility (3.482611) and ROE (802.000000)

9: None

In [46]:
# to print the companies in each cluster
for cl in stocks1["KM_segments"].unique():
    print("In cluster {}, the following companies are present:".format(cl))
    print(stocks1[stocks1["KM_segments"] == cl]["Security"].unique())
    print('-' * 100, '\n')
In cluster 0, the following companies are present:
['American Airlines Group' 'AbbVie' 'Abbott Laboratories'
 'Adobe Systems Inc' 'American International Group, Inc.' 'Albemarle Corp'
 'Applied Materials Inc' 'Arconic Inc' 'Activision Blizzard' 'Broadcom'
 'Boeing Company' 'Baxter International Inc.'
 'The Bank of New York Mellon Corp.' 'Ball Corp' 'Bristol-Myers Squibb'
 'Boston Scientific' 'BorgWarner' 'Caterpillar Inc.'
 'Citizens Financial Group' 'Centene Corporation' 'Citrix Systems'
 'Chevron Corp.' 'Delta Air Lines' 'Du Pont (E.I.)' 'Deere & Co.'
 'Quest Diagnostics' 'Delphi Automotive' 'Digital Realty Trust'
 'Dr Pepper Snapple Group' 'E*Trade' 'Fastenal Co'
 'Fortune Brands Home & Security' 'Fluor Corp.' 'FMC Corporation'
 'Gilead Sciences' 'Corning Inc.' 'General Motors' 'Garmin Ltd.'
 'Goodyear Tire & Rubber' 'Huntington Bancshares' "Honeywell Int'l Inc."
 'HP Inc.' 'Hormel Foods Corp.' 'Henry Schein' 'Host Hotels & Resorts'
 'Intl Flavors & Fragrances' 'Interpublic Group' 'Illinois Tool Works'
 'Invesco Ltd.' 'Jacobs Engineering Group' 'Juniper Networks'
 'Kimco Realty' 'Laboratory Corp. of America Holding'
 'L-3 Communications Holdings' 'Southwest Airlines'
 'Level 3 Communications' 'LyondellBasell' 'Mastercard Inc.' 'Masco Corp.'
 'Mattel Inc.' 'Mondelez International' 'Mead Johnson' 'Altria Group Inc'
 'Marathon Petroleum' 'Merck & Co.' 'Mylan N.V.' 'Navient'
 'Norfolk Southern Corp.' 'Nucor Corp.' 'Newell Brands'
 'Philip Morris International' 'PPG Industries' 'Phillips 66' 'PayPal'
 'Roper Industries' 'Charles Schwab Corporation' 'Sherwin-Williams'
 'Scripps Networks Interactive Inc.' 'SunTrust Banks' 'Tegna, Inc.'
 'The Travelers Companies Inc.' 'Tyson Foods' 'Tesoro Petroleum Co.'
 'Total System Services' 'Texas Instruments' 'Varian Medical Systems'
 'Valero Energy' 'Vulcan Materials' 'Verisign Inc.' 'Weyerhaeuser Corp.'
 'Dentsply Sirona' 'Xerox Corp.' 'Xylem Inc.' 'Zoetis']
---------------------------------------------------------------------------------------------------- 

In cluster 6, the following companies are present:
['Analog Devices, Inc.' 'Amgen Inc' 'Celgene Corp.' 'eBay Inc.'
 'Edwards Lifesciences' 'Facebook' 'First Solar Inc'
 'Frontier Communications' 'Halliburton Co.' "McDonald's Corp."
 'Monster Beverage' 'Newmont Mining Corp. (Hldg. Co.)'
 'Skyworks Solutions' 'TripAdvisor' 'Vertex Pharmaceuticals Inc'
 'Waters Corporation' 'Wynn Resorts Ltd' 'Yahoo Inc.']
---------------------------------------------------------------------------------------------------- 

In cluster 1, the following companies are present:
['Archer-Daniels-Midland Co' 'Ameren Corp' 'American Electric Power'
 'AFLAC Inc' 'Apartment Investment & Mgmt' 'Assurant Inc'
 'Arthur J. Gallagher & Co.' 'Akamai Technologies Inc'
 'Alaska Air Group Inc' 'Allstate Corp' 'AMETEK Inc'
 'Affiliated Managers Group Inc' 'Ameriprise Financial'
 'American Tower Corp A' 'AutoNation Inc' 'Anthem Inc.' 'Aon plc'
 'Amphenol Corp' 'AvalonBay Communities, Inc.'
 'American Water Works Company Inc' 'American Express Co'
 'BB&T Corporation' 'Bard (C.R.) Inc.' 'Boston Properties' 'Chubb Limited'
 'CBRE Group' 'Crown Castle International Corp.' 'Carnival Corp.'
 'CF Industries Holdings Inc' 'Church & Dwight' 'C. H. Robinson Worldwide'
 'CIGNA Corp.' 'Cincinnati Financial' 'Comerica Inc.' 'CME Group Inc.'
 'Cummins Inc.' 'CMS Energy' 'CenterPoint Energy' 'Capital One Financial'
 'The Cooper Companies' 'CSX Corp.' 'CenturyLink Inc'
 'Cognizant Technology Solutions' 'CVS Health' 'Dominion Resources'
 'Discover Financial Services' 'Danaher Corp.' 'The Walt Disney Company'
 'Discovery Communications-A' 'Discovery Communications-C'
 'Dun & Bradstreet' 'Dover Corp.' 'Duke Energy' 'DaVita Inc.'
 'Ecolab Inc.' 'Consolidated Edison' 'Equifax Inc.' "Edison Int'l"
 'Eastman Chemical' 'Equity Residential' 'Eversource Energy'
 'Essex Property Trust, Inc.' 'Eaton Corporation' 'Entergy Corp.'
 'Exelon Corp.' "Expeditors Int'l" 'Expedia Inc.' 'Extra Space Storage'
 'FirstEnergy Corp' 'Fidelity National Information Services' 'Fiserv Inc'
 'FLIR Systems' 'Flowserve Corporation' 'Federal Realty Investment Trust'
 'General Dynamics' 'General Growth Properties Inc.' 'Genuine Parts'
 'Grainger (W.W.) Inc.' 'Hasbro Inc.' 'HCA Holdings' 'Welltower Inc.'
 'HCP Inc.' 'Hartford Financial Svc.Gp.' 'Harley-Davidson'
 'The Hershey Company' 'Humana Inc.' 'International Business Machines'
 'IDEXX Laboratories' 'International Paper' 'Iron Mountain Incorporated'
 'J. B. Hunt Transport Services' 'Kansas City Southern' 'Leggett & Platt'
 'Lennar Corp.' 'LKQ Corporation' 'Lilly (Eli) & Co.'
 'Lockheed Martin Corp.' 'Alliant Energy Corp' 'Leucadia National Corp.'
 'Mid-America Apartments' 'Macerich' "Marriott Int'l." "Moody's Corp"
 'MetLife Inc.' 'Mohawk Industries' 'McCormick & Co.'
 'Martin Marietta Materials' 'Marsh & McLennan' '3M Company'
 'M&T Bank Corp.' 'NASDAQ OMX Group' 'NextEra Energy' 'Nielsen Holdings'
 'Northern Trust Corp.' 'Realty Income Corporation' 'Omnicom Group'
 "O'Reilly Automotive" "People's United Financial" 'Pitney-Bowes'
 'PACCAR Inc.' 'PG&E Corp.' 'Public Serv. Enterprise Inc.' 'PepsiCo Inc.'
 'Principal Financial Group' 'Procter & Gamble' 'Progressive Corp.'
 'Pulte Homes Inc.' 'PNC Financial Services' 'Pentair Ltd.'
 'Pinnacle West Capital' 'PPL Corp.' 'Prudential Financial' 'Praxair Inc.'
 'Ryder System' 'Royal Caribbean Cruises Ltd' 'Robert Half International'
 'Republic Services Inc' 'SCANA Corp' 'Sealed Air' 'SL Green Realty'
 'Southern Co.' 'Simon Property Group Inc' 'Stericycle Inc'
 'Sempra Energy' 'State Street Corp.' 'Synchrony Financial'
 'Stryker Corp.' 'Molson Coors Brewing Company' 'Torchmark Corp.'
 'Thermo Fisher Scientific' 'Tractor Supply Company' 'Under Armour'
 'United Continental Holdings' 'UDR Inc' 'Universal Health Services, Inc.'
 'United Health Group Inc.' 'Unum Group' 'Union Pacific'
 'United Parcel Service' 'United Technologies' 'Vornado Realty Trust'
 'Verisk Analytics' 'Ventas Inc' 'Wec Energy Group Inc' 'Whirlpool Corp.'
 'Waste Management Inc.' 'Western Union Co' 'Wyndham Worldwide'
 'Xcel Energy Inc' 'XL Capital' 'Yum! Brands Inc' 'Zimmer Biomet Holdings'
 'Zions Bancorp']
---------------------------------------------------------------------------------------------------- 

In cluster 7, the following companies are present:
['Alliance Data Systems' 'BIOGEN IDEC Inc.' 'Chipotle Mexican Grill'
 'Equinix' 'Intuitive Surgical Inc.' 'Mettler Toledo' 'Priceline.com Inc'
 'Regeneron']
---------------------------------------------------------------------------------------------------- 

In cluster 4, the following companies are present:
['Allegion' 'Charter Communications' 'Colgate-Palmolive' 'Kimberly-Clark'
 'S&P Global, Inc.']
---------------------------------------------------------------------------------------------------- 

In cluster 3, the following companies are present:
['Alexion Pharmaceuticals' 'Amazon.com Inc' 'Netflix Inc.']
---------------------------------------------------------------------------------------------------- 

In cluster 8, the following companies are present:
['Apache Corporation' 'Chesapeake Energy']
---------------------------------------------------------------------------------------------------- 

In cluster 9, the following companies are present:
['Anadarko Petroleum Corp' 'Baker Hughes Inc' 'Cabot Oil & Gas'
 'Concho Resources' 'Devon Energy Corp.' 'EOG Resources' 'EQT Corporation'
 'Freeport-McMoran Cp & Gld' 'Hess Corporation'
 'Hewlett Packard Enterprise' 'Kinder Morgan' 'The Mosaic Company'
 'Marathon Oil Corp.' 'Murphy Oil' 'Noble Energy Inc'
 'Newfield Exploration Co' 'National Oilwell Varco Inc.' 'ONEOK'
 'Occidental Petroleum' 'Quanta Services Inc.' 'Range Resources Corp.'
 'Spectra Energy Corp.' 'Southwestern Energy' 'Teradata Corp.'
 'Williams Cos.' 'Cimarex Energy']
---------------------------------------------------------------------------------------------------- 

In cluster 5, the following companies are present:
['Bank of America Corp' 'Intel Corp.']
---------------------------------------------------------------------------------------------------- 

In cluster 2, the following companies are present:
['Citigroup Inc.' 'Ford Motor' 'JPMorgan Chase & Co.' 'Coca Cola Company'
 'Pfizer Inc.' 'AT&T Inc' 'Verizon Communications' 'Wells Fargo'
 'Exxon Mobil Corp.']
---------------------------------------------------------------------------------------------------- 

In [47]:
stocks1.groupby(["KM_segments", "GICS Sector"])['Security'].count()
Out[47]:
KM_segments  GICS Sector                
0            Consumer Discretionary         10
             Consumer Staples                7
             Energy                          5
             Financials                     10
             Health Care                    15
             Industrials                    18
             Information Technology         14
             Materials                      10
             Real Estate                     4
             Telecommunications Services     1
1            Consumer Discretionary         22
             Consumer Staples                8
             Financials                     34
             Health Care                    14
             Industrials                    33
             Information Technology          8
             Materials                       7
             Real Estate                    22
             Telecommunications Services     1
             Utilities                      24
2            Consumer Discretionary          1
             Consumer Staples                1
             Energy                          1
             Financials                      3
             Health Care                     1
             Telecommunications Services     2
3            Consumer Discretionary          1
             Health Care                     1
             Information Technology          1
4            Consumer Discretionary          1
             Consumer Staples                2
             Financials                      1
             Industrials                     1
5            Financials                      1
             Information Technology          1
6            Consumer Discretionary          3
             Consumer Staples                1
             Energy                          1
             Health Care                     5
             Information Technology          6
             Materials                       1
             Telecommunications Services     1
7            Consumer Discretionary          2
             Health Care                     4
             Information Technology          1
             Real Estate                     1
8            Energy                          2
9            Energy                         21
             Industrials                     1
             Information Technology          2
             Materials                       2
Name: Security, dtype: int64
In [48]:
plt.figure(figsize=(20, 20))
plt.suptitle("Boxplot of numerical variables for each cluster")

# selecting numerical columns
num_col = stocks.select_dtypes(include=np.number).columns.tolist()

for i, variable in enumerate(num_col):
    plt.subplot(3, 4, i + 1)
    sns.boxplot(data=stocks1, x="KM_segments", y=variable)

plt.tight_layout(pad=2.0)

Hierarchical Clustering¶

Computing Cophenetic Correlation¶

In [49]:
hc_df = subset_scaled_df.copy()
In [50]:
# list of distance metrics
distance_metrics = ["euclidean", "chebyshev", "mahalanobis", "cityblock"]

# list of linkage methods
linkage_methods = ["single", "complete", "average", "weighted"]

high_cophenet_corr = 0
high_dm_lm = [0, 0]

for dm in distance_metrics:
    for lm in linkage_methods:
        Z = linkage(hc_df, metric=dm, method=lm)
        c, coph_dists = cophenet(Z, pdist(hc_df))
        print(
            "Cophenetic correlation for {} distance and {} linkage is {}.".format(
                dm.capitalize(), lm, c
            )
        )
        if high_cophenet_corr < c:
            high_cophenet_corr = c
            high_dm_lm[0] = dm
            high_dm_lm[1] = lm

# printing the combination of distance metric and linkage method with the highest cophenetic correlation
print('*'*100)
print(
    "Highest cophenetic correlation is {}, which is obtained with {} distance and {} linkage.".format(
        high_cophenet_corr, high_dm_lm[0].capitalize(), high_dm_lm[1]
    )
)
Cophenetic correlation for Euclidean distance and single linkage is 0.9232271494002922.
Cophenetic correlation for Euclidean distance and complete linkage is 0.7873280186580672.
Cophenetic correlation for Euclidean distance and average linkage is 0.9422540609560814.
Cophenetic correlation for Euclidean distance and weighted linkage is 0.8693784298129404.
Cophenetic correlation for Chebyshev distance and single linkage is 0.9062538164750717.
Cophenetic correlation for Chebyshev distance and complete linkage is 0.598891419111242.
Cophenetic correlation for Chebyshev distance and average linkage is 0.9338265528030499.
Cophenetic correlation for Chebyshev distance and weighted linkage is 0.9127355892367.
Cophenetic correlation for Mahalanobis distance and single linkage is 0.925919553052459.
Cophenetic correlation for Mahalanobis distance and complete linkage is 0.7925307202850002.
Cophenetic correlation for Mahalanobis distance and average linkage is 0.9247324030159736.
Cophenetic correlation for Mahalanobis distance and weighted linkage is 0.8708317490180428.
Cophenetic correlation for Cityblock distance and single linkage is 0.9334186366528574.
Cophenetic correlation for Cityblock distance and complete linkage is 0.7375328863205818.
Cophenetic correlation for Cityblock distance and average linkage is 0.9302145048594667.
Cophenetic correlation for Cityblock distance and weighted linkage is 0.731045513520281.
****************************************************************************************************
Highest cophenetic correlation is 0.9422540609560814, which is obtained with Euclidean distance and average linkage.
In [51]:
# printing the combination of distance metric and linkage method with the highest cophenetic correlation
print(
    "Highest cophenetic correlation is {}, which is obtained with {} distance and {} linkage.".format(
        high_cophenet_corr, high_dm_lm[0].capitalize(), high_dm_lm[1]
    )
)
Highest cophenetic correlation is 0.9422540609560814, which is obtained with Euclidean distance and average linkage.

Euclidean Distance Exploration¶

In [52]:
# list of linkage methods
linkage_methods = ["single", "complete", "average", "centroid", "ward", "weighted"]

high_cophenet_corr = 0
high_dm_lm = [0, 0]

for lm in linkage_methods:
    Z = linkage(subset_scaled_df, metric="euclidean", method=lm)
    c, coph_dists = cophenet(Z, pdist(subset_scaled_df))
    print("Cophenetic correlation for {} linkage is {}.".format(lm, c))
    if high_cophenet_corr < c:
        high_cophenet_corr = c
        high_dm_lm[0] = "euclidean"
        high_dm_lm[1] = lm
Cophenetic correlation for single linkage is 0.9232271494002922.
Cophenetic correlation for complete linkage is 0.7873280186580672.
Cophenetic correlation for average linkage is 0.9422540609560814.
Cophenetic correlation for centroid linkage is 0.9314012446828154.
Cophenetic correlation for ward linkage is 0.7101180299865353.
Cophenetic correlation for weighted linkage is 0.8693784298129404.
In [53]:
# printing the combination of distance metric and linkage method with the highest cophenetic correlation
print(
    "Highest cophenetic correlation is {}, which is obtained with {} linkage.".format(
        high_cophenet_corr, high_dm_lm[1]
    )
)
Highest cophenetic correlation is 0.9422540609560814, which is obtained with average linkage.

Checking Dendrograms¶

In [54]:
# list of linkage methods
linkage_methods = ["single", "complete", "average", "centroid", "ward", "weighted"]

# lists to save results of cophenetic correlation calculation
compare_cols = ["Linkage", "Cophenetic Coefficient"]
compare = []

# to create a subplot image
fig, axs = plt.subplots(len(linkage_methods), 1, figsize=(15, 30))

# We will enumerate through the list of linkage methods above
# For each linkage method, we will plot the dendrogram and calculate the cophenetic correlation
for i, method in enumerate(linkage_methods):
    Z = linkage(subset_scaled_df, metric="euclidean", method=method)

    dendrogram(Z, ax=axs[i])
    axs[i].set_title(f"Dendrogram ({method.capitalize()} Linkage)")

    coph_corr, coph_dist = cophenet(Z, pdist(subset_scaled_df))
    axs[i].annotate(
        f"Cophenetic\nCorrelation\n{coph_corr:0.2f}",
        (0.80, 0.80),
        xycoords="axes fraction",
    )
In [64]:
# to create and print a dataframe to compare cophenetic correlations for different linkage methods
df_cc = pd.DataFrame(compare, columns=compare_cols)
df_cc = df_cc.sort_values(by="Cophenetic Coefficient")
df_cc
Out[64]:
Linkage Cophenetic Coefficient

Creating model using sklearn¶

In [56]:
HCmodel = AgglomerativeClustering(n_clusters= 10, affinity='euclidean', linkage='average')  ## Complete the code to define the hierarchical clustering model
HCmodel.fit(hc_df)
Out[56]:
AgglomerativeClustering(affinity='euclidean', linkage='average', n_clusters=10)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
AgglomerativeClustering(affinity='euclidean', linkage='average', n_clusters=10)
In [57]:
subset_scaled_df["HC_Clusters"] = HCmodel.labels_
stocks1["HC_Clusters"] = HCmodel.labels_

Cluster Profiling¶

In [58]:
cluster_profile = stocks1.groupby("HC_Clusters").mean()
In [59]:
cluster_profile["count_in_each_segments"] = (
    stocks1.groupby("HC_Clusters")["Security"].count().values
)
In [60]:
# to display cluster profile
cluster_profile.style.highlight_max(color="lightblue", axis=0)
Out[60]:
  Current Price Price Change Volatility ROE Cash Ratio Net Cash Flow Net Income Earnings Per Share Estimated Shares Outstanding P/E Ratio P/B Ratio KM_segments count_in_each_segments
HC_Clusters                          
0 75.812141 3.903298 1.525584 35.919003 66.775701 44288380.062305 1176573903.426791 2.889798 450041271.641277 29.469645 -2.028330 1.778816 321
1 25.640000 11.237908 1.322355 12.500000 130.500000 16755500000.000000 13654000000.000000 3.295000 2791829362.100000 13.649696 1.508484 5.000000 2
2 152.564999 16.742017 2.314435 4.000000 130.000000 380861000.000000 133320500.000000 0.485000 317332352.950000 337.464244 -9.935778 3.000000 2
3 104.660004 16.224320 1.320606 8.000000 958.000000 592000000.000000 3669000000.000000 1.310000 2800763359.000000 79.893133 5.884467 6.000000 1
4 46.672222 5.166566 1.079367 25.000000 58.333333 -3040666666.666667 14848444444.444445 3.435556 4564959946.222222 15.596051 -6.354193 2.000000 9
5 276.570007 6.189286 1.116976 30.000000 25.000000 90885000.000000 596541000.000000 8.910000 66951851.850000 31.040405 129.064585 7.000000 1
6 4.500000 -38.101788 4.559815 687.000000 22.000000 -3283000000.000000 -14685000000.000000 -22.430000 654703522.100000 28.407929 -1.840528 8.000000 1
7 44.470001 11.397804 2.405408 917.000000 80.000000 698000000.000000 -23528000000.000000 -61.200000 384444444.400000 93.089287 4.970809 8.000000 1
8 675.890015 32.268105 1.460386 4.000000 58.000000 1333000000.000000 596000000.000000 1.280000 465625000.000000 528.039074 3.904430 3.000000 1
9 1274.949951 3.190527 1.268340 29.000000 184.000000 -1671386000.000000 2551360000.000000 50.090000 50935516.070000 25.453183 -1.052429 7.000000 1

Maximum Value of Column at a Particular KM_segment:

KM_segments

0: Count_in_each_segment (321)

1: Net Cash Flow (16,755,500,000.000000)

2: None

3: Cash Ratio (958.000000)

4: Net Income (14,848,444,444.444445) and Estimated Shares Outstanding (4,564,959,946.222222)

5: P/B Ratio (129.064585)

6: Volatility (4.559815) and KM_Segments (8.000000)

7: ROE (917.000000) and KM_Segments (8.000000)

8: Price Change (32.268105 ) and P/E Ratio (528.039074)

9: Current Price (1,274.949951) and Earnings Per Share (50.090000)

In [61]:
# to see the names of the companies in each cluster
for cl in stocks1["HC_Clusters"].unique():
    print("In cluster {}, the following companies are present:".format(cl))
    print(stocks1[stocks1["HC_Clusters"] == cl]["Security"].unique())
    print()
In cluster 0, the following companies are present:
['American Airlines Group' 'AbbVie' 'Abbott Laboratories'
 'Adobe Systems Inc' 'Analog Devices, Inc.' 'Archer-Daniels-Midland Co'
 'Ameren Corp' 'American Electric Power' 'AFLAC Inc'
 'American International Group, Inc.' 'Apartment Investment & Mgmt'
 'Assurant Inc' 'Arthur J. Gallagher & Co.' 'Akamai Technologies Inc'
 'Albemarle Corp' 'Alaska Air Group Inc' 'Allstate Corp' 'Allegion'
 'Applied Materials Inc' 'AMETEK Inc' 'Affiliated Managers Group Inc'
 'Amgen Inc' 'Ameriprise Financial' 'American Tower Corp A'
 'AutoNation Inc' 'Anthem Inc.' 'Aon plc' 'Anadarko Petroleum Corp'
 'Amphenol Corp' 'Arconic Inc' 'Activision Blizzard'
 'AvalonBay Communities, Inc.' 'Broadcom'
 'American Water Works Company Inc' 'American Express Co' 'Boeing Company'
 'Baxter International Inc.' 'BB&T Corporation' 'Bard (C.R.) Inc.'
 'Baker Hughes Inc' 'BIOGEN IDEC Inc.' 'The Bank of New York Mellon Corp.'
 'Ball Corp' 'Bristol-Myers Squibb' 'Boston Scientific' 'BorgWarner'
 'Boston Properties' 'Caterpillar Inc.' 'Chubb Limited' 'CBRE Group'
 'Crown Castle International Corp.' 'Carnival Corp.' 'Celgene Corp.'
 'CF Industries Holdings Inc' 'Citizens Financial Group' 'Church & Dwight'
 'C. H. Robinson Worldwide' 'Charter Communications' 'CIGNA Corp.'
 'Cincinnati Financial' 'Colgate-Palmolive' 'Comerica Inc.'
 'CME Group Inc.' 'Chipotle Mexican Grill' 'Cummins Inc.' 'CMS Energy'
 'Centene Corporation' 'CenterPoint Energy' 'Capital One Financial'
 'Cabot Oil & Gas' 'The Cooper Companies' 'CSX Corp.' 'CenturyLink Inc'
 'Cognizant Technology Solutions' 'Citrix Systems' 'CVS Health'
 'Chevron Corp.' 'Concho Resources' 'Dominion Resources' 'Delta Air Lines'
 'Du Pont (E.I.)' 'Deere & Co.' 'Discover Financial Services'
 'Quest Diagnostics' 'Danaher Corp.' 'The Walt Disney Company'
 'Discovery Communications-A' 'Discovery Communications-C'
 'Delphi Automotive' 'Digital Realty Trust' 'Dun & Bradstreet'
 'Dover Corp.' 'Dr Pepper Snapple Group' 'Duke Energy' 'DaVita Inc.'
 'Devon Energy Corp.' 'eBay Inc.' 'Ecolab Inc.' 'Consolidated Edison'
 'Equifax Inc.' "Edison Int'l" 'Eastman Chemical' 'EOG Resources'
 'Equinix' 'Equity Residential' 'EQT Corporation' 'Eversource Energy'
 'Essex Property Trust, Inc.' 'E*Trade' 'Eaton Corporation'
 'Entergy Corp.' 'Edwards Lifesciences' 'Exelon Corp.' "Expeditors Int'l"
 'Expedia Inc.' 'Extra Space Storage' 'Fastenal Co'
 'Fortune Brands Home & Security' 'Freeport-McMoran Cp & Gld'
 'FirstEnergy Corp' 'Fidelity National Information Services' 'Fiserv Inc'
 'FLIR Systems' 'Fluor Corp.' 'Flowserve Corporation' 'FMC Corporation'
 'Federal Realty Investment Trust' 'First Solar Inc'
 'Frontier Communications' 'General Dynamics'
 'General Growth Properties Inc.' 'Gilead Sciences' 'Corning Inc.'
 'General Motors' 'Genuine Parts' 'Garmin Ltd.' 'Goodyear Tire & Rubber'
 'Grainger (W.W.) Inc.' 'Halliburton Co.' 'Hasbro Inc.'
 'Huntington Bancshares' 'HCA Holdings' 'Welltower Inc.' 'HCP Inc.'
 'Hess Corporation' 'Hartford Financial Svc.Gp.' 'Harley-Davidson'
 "Honeywell Int'l Inc." 'Hewlett Packard Enterprise' 'HP Inc.'
 'Hormel Foods Corp.' 'Henry Schein' 'Host Hotels & Resorts'
 'The Hershey Company' 'Humana Inc.' 'International Business Machines'
 'IDEXX Laboratories' 'Intl Flavors & Fragrances' 'International Paper'
 'Interpublic Group' 'Iron Mountain Incorporated'
 'Intuitive Surgical Inc.' 'Illinois Tool Works' 'Invesco Ltd.'
 'J. B. Hunt Transport Services' 'Jacobs Engineering Group'
 'Juniper Networks' 'Kimco Realty' 'Kimberly-Clark' 'Kinder Morgan'
 'Kansas City Southern' 'Leggett & Platt' 'Lennar Corp.'
 'Laboratory Corp. of America Holding' 'LKQ Corporation'
 'L-3 Communications Holdings' 'Lilly (Eli) & Co.' 'Lockheed Martin Corp.'
 'Alliant Energy Corp' 'Leucadia National Corp.' 'Southwest Airlines'
 'Level 3 Communications' 'LyondellBasell' 'Mastercard Inc.'
 'Mid-America Apartments' 'Macerich' "Marriott Int'l." 'Masco Corp.'
 'Mattel Inc.' "McDonald's Corp." "Moody's Corp" 'Mondelez International'
 'MetLife Inc.' 'Mohawk Industries' 'Mead Johnson' 'McCormick & Co.'
 'Martin Marietta Materials' 'Marsh & McLennan' '3M Company'
 'Monster Beverage' 'Altria Group Inc' 'The Mosaic Company'
 'Marathon Petroleum' 'Merck & Co.' 'Marathon Oil Corp.' 'M&T Bank Corp.'
 'Mettler Toledo' 'Murphy Oil' 'Mylan N.V.' 'Navient' 'Noble Energy Inc'
 'NASDAQ OMX Group' 'NextEra Energy' 'Newmont Mining Corp. (Hldg. Co.)'
 'Newfield Exploration Co' 'Nielsen Holdings'
 'National Oilwell Varco Inc.' 'Norfolk Southern Corp.'
 'Northern Trust Corp.' 'Nucor Corp.' 'Newell Brands'
 'Realty Income Corporation' 'ONEOK' 'Omnicom Group' "O'Reilly Automotive"
 'Occidental Petroleum' "People's United Financial" 'Pitney-Bowes'
 'PACCAR Inc.' 'PG&E Corp.' 'Public Serv. Enterprise Inc.' 'PepsiCo Inc.'
 'Principal Financial Group' 'Procter & Gamble' 'Progressive Corp.'
 'Pulte Homes Inc.' 'Philip Morris International' 'PNC Financial Services'
 'Pentair Ltd.' 'Pinnacle West Capital' 'PPG Industries' 'PPL Corp.'
 'Prudential Financial' 'Phillips 66' 'Quanta Services Inc.'
 'Praxair Inc.' 'PayPal' 'Ryder System' 'Royal Caribbean Cruises Ltd'
 'Regeneron' 'Robert Half International' 'Roper Industries'
 'Range Resources Corp.' 'Republic Services Inc' 'SCANA Corp'
 'Charles Schwab Corporation' 'Spectra Energy Corp.' 'Sealed Air'
 'Sherwin-Williams' 'SL Green Realty' 'Scripps Networks Interactive Inc.'
 'Southern Co.' 'Simon Property Group Inc' 'S&P Global, Inc.'
 'Stericycle Inc' 'Sempra Energy' 'SunTrust Banks' 'State Street Corp.'
 'Skyworks Solutions' 'Southwestern Energy' 'Synchrony Financial'
 'Stryker Corp.' 'Molson Coors Brewing Company' 'Teradata Corp.'
 'Tegna, Inc.' 'Torchmark Corp.' 'Thermo Fisher Scientific' 'TripAdvisor'
 'The Travelers Companies Inc.' 'Tractor Supply Company' 'Tyson Foods'
 'Tesoro Petroleum Co.' 'Total System Services' 'Texas Instruments'
 'Under Armour' 'United Continental Holdings' 'UDR Inc'
 'Universal Health Services, Inc.' 'United Health Group Inc.' 'Unum Group'
 'Union Pacific' 'United Parcel Service' 'United Technologies'
 'Varian Medical Systems' 'Valero Energy' 'Vulcan Materials'
 'Vornado Realty Trust' 'Verisk Analytics' 'Verisign Inc.'
 'Vertex Pharmaceuticals Inc' 'Ventas Inc' 'Waters Corporation'
 'Wec Energy Group Inc' 'Whirlpool Corp.' 'Waste Management Inc.'
 'Williams Cos.' 'Western Union Co' 'Weyerhaeuser Corp.'
 'Wyndham Worldwide' 'Wynn Resorts Ltd' 'Cimarex Energy' 'Xcel Energy Inc'
 'XL Capital' 'Dentsply Sirona' 'Xerox Corp.' 'Xylem Inc.' 'Yahoo Inc.'
 'Yum! Brands Inc' 'Zimmer Biomet Holdings' 'Zions Bancorp' 'Zoetis']

In cluster 5, the following companies are present:
['Alliance Data Systems']

In cluster 2, the following companies are present:
['Alexion Pharmaceuticals' 'Netflix Inc.']

In cluster 8, the following companies are present:
['Amazon.com Inc']

In cluster 7, the following companies are present:
['Apache Corporation']

In cluster 1, the following companies are present:
['Bank of America Corp' 'Intel Corp.']

In cluster 4, the following companies are present:
['Citigroup Inc.' 'Ford Motor' 'JPMorgan Chase & Co.' 'Coca Cola Company'
 'Pfizer Inc.' 'AT&T Inc' 'Verizon Communications' 'Wells Fargo'
 'Exxon Mobil Corp.']

In cluster 6, the following companies are present:
['Chesapeake Energy']

In cluster 3, the following companies are present:
['Facebook']

In cluster 9, the following companies are present:
['Priceline.com Inc']

In [62]:
stocks1.groupby(["HC_Clusters", "GICS Sector"])['Security'].count()
Out[62]:
HC_Clusters  GICS Sector                
0            Consumer Discretionary         37
             Consumer Staples               18
             Energy                         27
             Financials                     45
             Health Care                    38
             Industrials                    53
             Information Technology         29
             Materials                      20
             Real Estate                    27
             Telecommunications Services     3
             Utilities                      24
1            Financials                      1
             Information Technology          1
2            Health Care                     1
             Information Technology          1
3            Information Technology          1
4            Consumer Discretionary          1
             Consumer Staples                1
             Energy                          1
             Financials                      3
             Health Care                     1
             Telecommunications Services     2
5            Information Technology          1
6            Energy                          1
7            Energy                          1
8            Consumer Discretionary          1
9            Consumer Discretionary          1
Name: Security, dtype: int64
In [63]:
plt.figure(figsize=(20, 20))
plt.suptitle("Boxplot of numerical variables for each cluster")

for i, variable in enumerate(num_col):
    plt.subplot(3, 4, i + 1)
    sns.boxplot(data=stocks1, x="HC_Clusters", y=variable)

plt.tight_layout(pad=2.0)

Cluster 0 Characteristics

  • Current Price: Low
  • Price Change: Wide
  • Volatility: Wide
  • ROE: Varied
  • Cash Ratio: Varied
  • Net Cash Flow: Neutral
  • Net Income: Varied
  • Earnings Per Share: Varied
  • P/E Ratio: Low
  • Number of Shares Outstanding: Low

Cluster 1 Characteristics

  • Current Price: Low
  • Price Change: Narrow
  • Volatility: Narrow
  • ROE: Low
  • Cash Ratio: Medium
  • Net Cash Flow: Positive
  • Net Income: High
  • Earnings Per Share: Low
  • P/E Ratio: Low
  • Number of Shares Outstanding: High

Cluster 2 Characteristics

  • Current Price: High
  • Price Change: Narrow
  • Volatility: Narrow
  • ROE: Low
  • Cash Ratio: Medium
  • Net Cash Flow: Positive
  • Net Income: Low
  • Earnings Per Share: Low
  • P/E Ratio: High
  • Number of Shares Outstanding: Low

Cluster 3 Characteristics

  • Current Price: Medium
  • Price Change: Narrow
  • Volatility: Narrow
  • ROE: Low
  • Cash Ratio: High
  • Net Cash Flow: Positive
  • Net Income: Low
  • Earnings Per Share: Low
  • P/E Ratio: Low
  • Number of Shares Outstanding: High

Cluster 4 Characteristics

  • Current Price: Low
  • Price Change: Narrow
  • Volatility: Narrow
  • ROE: Low
  • Cash Ratio: Medium
  • Net Cash Flow: Negative
  • Net Income: High
  • Earnings Per Share: Low
  • P/E Ratio: Low
  • Number of Shares Outstanding: High

Cluster 5 Characteristics

  • Current Price: Low
  • Price Change: Narrow
  • Volatility: Narrow
  • Cash Ratio: Low
  • Net Cash Flow: Positive
  • Net Income: Low
  • Earnings Per Share: Low
  • P/E Ratio: Low
  • Number of Shares Outstanding: Low

Cluster 6 Characteristics

  • Current Price: Low
  • Price Change: Narrow
  • Volatility: Narrow
  • ROE: High
  • Cash Ratio: Low
  • Net Cash Flow: Negative
  • Net Income: Low (Negative)
  • Earnings Per Share: Low (Negative)
  • P/E Ratio: Low
  • Number of Shares Outstanding: Low

Cluster 7 Characteristics

  • Current Price: Low
  • Price Change: Narrow
  • Volatility: Narrow
  • ROE: High
  • Stock Analysis of Cluster:
  • Cash Ratio: Low
  • Net Cash Flow: Positive
  • Net Income: Low (Negative)
  • Earnings Per Share: Low (Negative)
  • P/E Ratio: Low
  • Number of Shares Outstanding: Low

Cluster 8 Characteristics

  • Current Price: High
  • Price Change: Narrow
  • Volatility: Narrow
  • ROE: Low
  • Stock Analysis of Cluster:
  • Cash Ratio: Low
  • Net Cash Flow: Positive
  • Net Income: Low
  • Earnings Per Share: Low
  • P/E Ratio: High
  • Number of Shares Outstanding: Low

Cluster 9 Characteristics

  • Current Price: High
  • Price Change: Narrow
  • Volatility: Narrow
  • ROE: Low
  • Stock Analysis of Cluster:
  • Cash Ratio: Low
  • Net Cash Flow: Negative
  • Net Income: Low
  • Earnings Per Share: High
  • P/E Ratio: Low
  • Number of Shares Outstanding: Low

Actionable Insights and Recommendations¶

Cluster 0 Characteristics

       Consumer Discretionary         37
         Consumer Staples               18
         Energy                         27
         Financials                     45
         Health Care                    38
         Industrials                    53
         Information Technology         29
         Materials                      20
         Real Estate                    27
         Telecommunications Services     3
         Utilities                      24
  • Current Price: Low
  • Price Change: Wide
  • Volatility: Wide
  • ROE: Varied
  • Cash Ratio: Varied
  • Net Cash Flow: Neutral
  • Net Income: Varied
  • Earnings Per Share: Varied
  • P/E Ratio: Low
  • Number of Shares Outstanding: Low

Cluster 1 Characteristics

       Financials                      1
         Information Technology          1
  • Current Price: Low
  • Price Change: Narrow
  • Volatility: Narrow
  • ROE: Low
  • Cash Ratio: Medium
  • Net Cash Flow: Positive
  • Net Income: High
  • Earnings Per Share: Low
  • P/E Ratio: Low
  • Number of Shares Outstanding: High

Cluster 2 Characteristics

       Health Care                     1
         Information Technology          1
  • Current Price: High
  • Price Change: Narrow
  • Volatility: Narrow
  • ROE: Low
  • Cash Ratio: Medium
  • Net Cash Flow: Positive
  • Net Income: Low
  • Earnings Per Share: Low
  • P/E Ratio: High
  • Number of Shares Outstanding: Low

Cluster 3 Characteristics

        Information Technology          1
  • Current Price: Medium
  • Price Change: Narrow
  • Volatility: Narrow
  • ROE: Low
  • Cash Ratio: High
  • Net Cash Flow: Positive
  • Net Income: Low
  • Earnings Per Share: Low
  • P/E Ratio: Low
  • Number of Shares Outstanding: High

Cluster 4 Characteristics

       Consumer Discretionary          1
         Consumer Staples                1
         Energy                          1
         Financials                      3
         Health Care                     1
         Telecommunications Services     2
  • Current Price: Low
  • Price Change: Narrow
  • Volatility: Narrow
  • ROE: Low
  • Cash Ratio: Medium
  • Net Cash Flow: Negative
  • Net Income: High
  • Earnings Per Share: Low
  • P/E Ratio: Low
  • Number of Shares Outstanding: High

Cluster 5 Characteristics

       Information Technology          1
  • Current Price: Low
  • Price Change: Narrow
  • Volatility: Narrow
  • Cash Ratio: Low
  • Net Cash Flow: Positive
  • Net Income: Low
  • Earnings Per Share: Low
  • P/E Ratio: Low
  • Number of Shares Outstanding: Low

Cluster 6 Characteristics

       Energy                          1

Cluster 6 Characteristics

  • Current Price: Low
  • Price Change: Narrow
  • Volatility: Narrow
  • ROE: High
  • Cash Ratio: Low
  • Net Cash Flow: Negative
  • Net Income: Low (Negative)
  • Earnings Per Share: Low (Negative)
  • P/E Ratio: Low
  • Number of Shares Outstanding: Low

Cluster 7 Characteristics

       Energy                          1
  • Current Price: Low
  • Price Change: Narrow
  • Volatility: Narrow
  • ROE: High
  • Stock Analysis of Cluster:
  • Cash Ratio: Low
  • Net Cash Flow: Positive
  • Net Income: Low (Negative)
  • Earnings Per Share: Low (Negative)
  • P/E Ratio: Low
  • Number of Shares Outstanding: Low

Cluster 8 Characteristics

       Consumer Discretionary          1
  • Current Price: High
  • Price Change: Narrow
  • Volatility: Narrow
  • ROE: Low
  • Stock Analysis of Cluster:
  • Cash Ratio: Low
  • Net Cash Flow: Positive
  • Net Income: Low
  • Earnings Per Share: Low
  • P/E Ratio: High
  • Number of Shares Outstanding: Low

Cluster 9 Characteristics

       Consumer Discretionary          1
  • Current Price: High
  • Price Change: Narrow
  • Volatility: Narrow
  • ROE: Low
  • Stock Analysis of Cluster:
  • Cash Ratio: Low
  • Net Cash Flow: Negative
  • Net Income: Low
  • Earnings Per Share: High
  • P/E Ratio: Low
  • Number of Shares Outstanding: Low

Insights:¶

  • Review Sectors quarter over quarter to confirm success of:

Net Cash Flow (Cluster 1)

Cash Ratio (Cluster 3)

Net Income and Estimated Shares Outstanding (Cluster 4)

P/B Ratio (Cluster 5)

Volatility (Cluster 6)

ROE (cluster 7)

Price Change and P/E Ratio (Cluster 8)

Current Price and Earnings Per Share (Cluster 9)

  • Optimal stocks will likely share relatively close characteristics and likely be in the same sector.